Predicting tryptic cleavage from proteomics data using decision tree ensembles.
نویسندگان
چکیده
Trypsin is the workhorse protease in mass spectrometry-based proteomics experiments and is used to digest proteins into more readily analyzable peptides. To identify these peptides after mass spectrometric analysis, the actual digestion has to be mimicked as faithfully as possible in silico. In this paper we introduce CP-DT (Cleavage Prediction with Decision Trees), an algorithm based on a decision tree ensemble that was learned on publicly available peptide identification data from the PRIDE repository. We demonstrate that CP-DT is able to accurately predict tryptic cleavage: tests on three independent data sets show that CP-DT significantly outperforms the Keil rules that are currently used to predict tryptic cleavage. Moreover, the trees generated by CP-DT can make predictions efficiently and are interpretable by domain experts.
منابع مشابه
Provide a Predictive Model to Identify People with Diabetes Using the Decision Tree
Background: Today, in most hospitals in Iran, there is an extensive database of patient characteristics that includes a large amount of information related to medical, family and medical records. Finding a knowledge model of this information can help to predict the performance of the medical system and improve educational processes. Methods: Data mining techniques are analytical tools that are...
متن کاملDetermining Factors Influencing Length of Stay and Predicting Length of Stay Using Data Mining in the General Surgery Department
Background: Length of stay is one of the most important indicators in assessing hospital performance. A shorter stay can reduce the costs per discharge and shift care from inpatient to less expensive post-acute settings. It can lead to a greater readmission rate, better resource management, and more efficient services. Objective: This study aimed to ident...
متن کاملPredicting The Type of Malaria Using Classification and Regression Decision Trees
Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...
متن کاملInvestigation of Property Valuation Models Based on Decision Tree Ensembles Built over Noised Data
The ensemble machine learning methods incorporating bagging, random subspace, random forest, and rotation forest employing decision trees, i.e. Pruned Model Trees, as base learning algorithms were developed in WEKA environment. The methods were applied to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions. T...
متن کاملمطالعات درخت تصمیم در برآورد ریسک ابتلا به سرطان سینه با استفاده از چند شکلیهای تک نوکلوئیدی
Abstract Introduction: Decision tree is the data mining tools to collect, accurate prediction and sift information from massive amounts of data that are used widely in the field of computational biology and bioinformatics. In bioinformatics can be predict on diseases, including breast cancer. The use of genomic data including single nucleotide polymorphisms is a very important ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of proteome research
دوره 12 5 شماره
صفحات -
تاریخ انتشار 2013